R provides several classes for representing time series objects for a variety of applications. Among those classes, ts is one of the main formats for time series data in R, mainly due to its simplicity and the wide adoption of this class by the main packages in R for time series analysis, for example, the forecast and stats packages.

The Natural Gas Consumption dataset

library(pacman)
p_load(Quandl)

ngc <- Quandl(code = "FRED/NATURALGAS",
              collapse = "quarterly",
              type = "ts",
              end_date = "2018-12-31")

class(ngc)

The simplest method to plot a ts object is with the plot function:

The attributes of the ts class

A regular time series is defined as an ordered sequence of observations over time, which is captured at equally spaced time intervals. Whenever this condition ceases to exist, the series becomes an irregular time series. The main characteristics of regular time series data is as follows:

A ts object is composed of two elements - the series values and its corresponding timestamp.

# number of observaitions
length(ngc)
[1] 76

We can look at the structure of a ts dataset with the head() function:

ngc
       Qtr1   Qtr2   Qtr3   Qtr4
2000 2050.6 1513.1 1475.0 2587.5
2001 2246.6 1444.4 1494.1 2120.2
2002 2258.4 1591.4 1542.2 2378.9
2003 2197.9 1368.4 1428.6 2263.7
2004 2100.9 1483.7 1482.2 2327.7
2005 2205.8 1534.1 1422.5 2326.4
2006 2126.4 1550.9 1462.1 2122.8
2007 2128.9 1555.2 1590.5 2399.2
2008 2278.2 1604.3 1460.9 2399.7
2009 2170.7 1527.8 1575.0 2491.9
2010 2142.9 1649.5 1637.5 2714.1
2011 2230.5 1657.3 1655.6 2541.9
2012 2127.8 1868.4 1807.2 2503.9
2013 2521.1 1742.9 1767.0 2920.8
2014 2557.9 1745.4 1809.3 2679.2
2015 2591.3 1899.9 1901.3 2588.2
2016 2356.3 2000.7 1947.8 2866.3
2017 2523.3 1910.4 1920.5 3086.0
2018 2796.7 2063.1 2156.1 2999.5

Here the rows represent the number of the cycle and the columns represent the cycle units. For the ngc data, each calendar year is a full cycle and the quarters are the cycle units.

The cycle() and the time() functions from the stats package provide the cycle units and the timestamp of each observation in the series:

cycle(ngc)
     Qtr1 Qtr2 Qtr3 Qtr4
2000    1    2    3    4
2001    1    2    3    4
2002    1    2    3    4
2003    1    2    3    4
2004    1    2    3    4
2005    1    2    3    4
2006    1    2    3    4
2007    1    2    3    4
2008    1    2    3    4
2009    1    2    3    4
2010    1    2    3    4
2011    1    2    3    4
2012    1    2    3    4
2013    1    2    3    4
2014    1    2    3    4
2015    1    2    3    4
2016    1    2    3    4
2017    1    2    3    4
2018    1    2    3    4
time(ngc)
        Qtr1    Qtr2    Qtr3    Qtr4
2000 2000.00 2000.25 2000.50 2000.75
2001 2001.00 2001.25 2001.50 2001.75
2002 2002.00 2002.25 2002.50 2002.75
2003 2003.00 2003.25 2003.50 2003.75
2004 2004.00 2004.25 2004.50 2004.75
2005 2005.00 2005.25 2005.50 2005.75
2006 2006.00 2006.25 2006.50 2006.75
2007 2007.00 2007.25 2007.50 2007.75
2008 2008.00 2008.25 2008.50 2008.75
2009 2009.00 2009.25 2009.50 2009.75
2010 2010.00 2010.25 2010.50 2010.75
2011 2011.00 2011.25 2011.50 2011.75
2012 2012.00 2012.25 2012.50 2012.75
2013 2013.00 2013.25 2013.50 2013.75
2014 2014.00 2014.25 2014.50 2014.75
2015 2015.00 2015.25 2015.50 2015.75
2016 2016.00 2016.25 2016.50 2016.75
2017 2017.00 2017.25 2017.50 2017.75
2018 2018.00 2018.25 2018.50 2018.75

A more concise way to get this information is with the frequency() and deltat() functions:

deltat(ngc)
[1] 0.25

Other useful functions are start() and end():

start(ngc)
[1] 2000    1
end(ngc)
[1] 2018    4

The ts_info() function from the TStudio package provides a concise summary of most of the functions above.

ts_info(ngc)
 The ngc series is a ts object with 1 variable and 76 observations
 Frequency: 4 
 Start time: 2000 1 
 End time: 2018 4 

Multivariate time series objects

When you have multivariate time series data, you need to use the mts (multiple time series) class. This combines the functionality of the ts and matrix classes.

ts_info(Coffee_Prices)
 The Coffee_Prices series is a mts object with 2 variables and 701 observations
 Frequency: 12 
 Start time: 1960 1 
 End time: 2018 5 

Creating a ts object

my_ts1 <- ts(data = 1:60,
             start = c(2010, 1),
             end = c(2014, 12),
             frequency = 12)

ts_info(my_ts1)
 The my_ts1 series is a ts object with 1 variable and 60 observations
 Frequency: 12 
 Start time: 2010 1 
 End time: 2014 12 
my_ts1
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2010   1   2   3   4   5   6   7   8   9  10  11  12
2011  13  14  15  16  17  18  19  20  21  22  23  24
2012  25  26  27  28  29  30  31  32  33  34  35  36
2013  37  38  39  40  41  42  43  44  45  46  47  48
2014  49  50  51  52  53  54  55  56  57  58  59  60

Now we will work through the typical process of converting data from a data.frame to a ts object.

str(US_indicators)
'data.frame':   528 obs. of  3 variables:
 $ Date             : Date, format: "1976-01-31" "1976-02-29" "1976-03-31" "1976-04-30" ...
 $ Vehicle Sales    : num  885 995 1244 1191 1203 ...
 $ Unemployment Rate: num  8.8 8.7 8.1 7.4 6.8 8 7.8 7.6 7.4 7.2 ...

For now, we will only convert the vehicle sales into a ts object.

Next, we need to define the start or end of the series. In this case, the series started in January 1976 so we can define it as start = c(1976, 1). Or we can write code to capture the starting point.

start_point
[1] 1976    1

Now we build the series:

One of the main limitations of the ts class is that it can only support two input elements for the timestamp. For example, when we converted tvs into a ts object, we lost the day component because ts could only store the month and year.

Creating an mts object

ts_info(US_indicators_ts)
 The US_indicators_ts series is a mts object with 2 variables and 528 observations
 Frequency: 12 
 Start time: 1976 1 
 End time: 2019 12 

Setting the series frequency

Setting the frequency of a series sets the length of a cycle.

\[ \text{Frequency} = \frac{\text{cycle length}}{\text{time interval between observation}} \]

In this example we will see how setting the frequency impacts the structure of the ts object output. First, we simulate close to ten years of daily data.

str(daily_df)
'data.frame':   3650 obs. of  2 variables:
 $ date: Date, format: "2010-01-01" "2010-01-02" "2010-01-03" "2010-01-04" ...
 $ y   : num  14 11.9 16.4 12.8 13.8 ...

Create ts object:

ts_info(days_week_ts)
 The days_week_ts series is a mts object with 2 variables and 3650 observations
 Frequency: 7 
 Start time: 1 6 
 End time: 523 1 

Data manipulation of ts objects

The window function

The main purpose of a window function is to subset a ts object based on a time range. The main argument of the window() function are the start and end arguments. Let’s use the window() function to extract all the observations of the year 2005 from the NGC series:

window(ngc, start = c(2005, 1), end = c(2005, 4))
       Qtr1   Qtr2   Qtr3   Qtr4
2005 2205.8 1534.1 1422.5 2326.4

We can also extract a specific frequency unit from the series. Say we’re interested in extracting all the observations of the series that occurred in the third quarter of the year. This can be done by setting the starting point at the third quarter of the first year and the frequency to 1.

window(ngc, start = c(2000, 3), frequency = 1)
Time Series:
Start = 2000.5 
End = 2018.5 
Frequency = 1 
 [1] 1475.0 1494.1 1542.2 1428.6 1482.2 1422.5 1462.1 1590.5 1460.9 1575.0 1637.5 1655.6 1807.2 1767.0 1809.3 1901.3 1947.8 1920.5 2156.1

Aggregating ts objects

The aggregate() function splits the data into subsets, computes specific summary statistics, and then aggregates the results to a ts or data.frame object. Let’s use aggregate() to transform the NGC series from a quarterly frequency to yearly:

1+1
[1] 2

Creating lags and leads for ts objects

The lag() function from the stats package (this should not be confused with the lag() function from the dplyr package) can be used to create lags or leads for ts objects.

ts_info(ngc_lag4)
 The ngc_lag4 series is a ts object with 1 variable and 76 observations
 Frequency: 4 
 Start time: 2001 1 
 End time: 2019 4 

Visualizing ts and mts objects

The plot.ts() function

Plotting a ts object:

Plotting an mts object:

The dygraphs package

The dygraphs package is an R interface to the dygraphs JavaScript charting library.

For the US_indicators_ts series, we will add a second y-axis, which allows us to plot and compare the two series that are not on the same scale:

The TSstudio package

ts_plot(tvs_ts,
        title = "US Monthly Total Vehicle Sales",
        Ytitle = "Thousands of Vehicle",
        slider = TRUE)
Registered S3 method overwritten by 'data.table':
  method           from
  print.data.table     

We can add an interactive slider for the x-axis.

---
title: "Chapter 2: The Time Series Object"
output: html_notebook
---

R provides several classes for representing time series objects for a variety of applications. Among those classes, `ts` is one of the main formats for time series data in R, mainly due to its simplicity and the wide adoption of this class by the main packages in R for time series analysis, for example, the `forecast` and `stats` packages.

# The Natural Gas Consumption dataset

```{r}
library(pacman)
p_load(Quandl)

ngc <- Quandl(code = "FRED/NATURALGAS",
              collapse = "quarterly",
              type = "ts",
              end_date = "2018-12-31")

class(ngc)
```

The simplest method to plot a `ts` object is with the `plot` function:

```{r}
plot.ts(ngc,
        main = "US Quarterly Natural Gas Consumption",
        ylab = "Billion of Cubic Feet")
```

# The attributes of the `ts` class

A regular time series is defined as an ordered sequence of observations over time, which is captured at equally spaced time intervals. Whenever this condition ceases to exist, the series becomes an irregular time series. The main characteristics of regular time series data is as follows:

-   Cycle/period: a regular unit of time that split the series into consecutive and equally long subsets

-   frequency: defines the length or the number of units of the cycle

-   timestamp: provides the time each observation in the series was captured, and can be used as the series index.

A `ts` object is composed of two elements - the series values and its corresponding timestamp.

```{r}
# number of observaitions
length(ngc)
```

We can look at the structure of a `ts` dataset with the `head()` function:

```{r}
ngc
```

Here the rows represent the number of the cycle and the columns represent the cycle units. For the `ngc` data, each calendar year is a full cycle and the quarters are the cycle units.

The `cycle()` and the `time()` functions from the **stats** package provide the cycle units and the timestamp of each observation in the series:

```{r}
cycle(ngc)

time(ngc)
```

A more concise way to get this information is with the `frequency()` and `deltat()` functions:

```{r}
frequency(ngc)

deltat(ngc)
```

Other useful functions are `start()` and `end()`:

```{r}
start(ngc)

end(ngc)
```

The `ts_info()` function from the **TStudio** package provides a concise summary of most of the functions above.

```{r}
p_load(TSstudio)

ts_info(ngc)
```

## Multivariate time series objects

When you have multivariate time series data, you need to use the `mts` (multiple time series) class. This combines the functionality of the `ts` and `matrix` classes.

```{r}
data("Coffee_Prices")
head(Coffee_Prices)

ts_info(Coffee_Prices)
```

## Creating a `ts` object

```{r}
my_ts1 <- ts(data = 1:60,
             start = c(2010, 1),
             end = c(2014, 12),
             frequency = 12)

ts_info(my_ts1)

my_ts1
```

Now we will work through the typical process of converting data from a `data.frame` to a `ts` object.

```{r}
library(tidyverse)

# load the data
data("US_indicators")
str(US_indicators)
```

For now, we will only convert the vehicle sales into a `ts` object.

```{r}
tvs <- 
  US_indicators %>% 
  select(Date, `Vehicle Sales`) %>% 
  arrange(Date)

head(tvs)
```

Next, we need to define the start or end of the series. In this case, the series started in January 1976 so we can define it as `start = c(1976, 1)`. Or we can write code to capture the starting point.

```{r}
library(lubridate)

start_point <- c(year(min(tvs$Date)), month(min(tvs$Date)))
start_point
```

Now we build the series:

```{r}
tvs_ts <- ts(data = tvs$`Vehicle Sales`,
             start = start_point,
             frequency = 12)
```

One of the main limitations of the `ts` class is that it can only support two input elements for the timestamp. For example, when we converted `tvs` into a `ts` object, we lost the day component because `ts` could only store the month and year.

## Creating an `mts` object

```{r}
US_indicators <- arrange(US_indicators, Date)

US_indicators_ts <- ts(data = select(US_indicators, `Vehicle Sales`, 
                                     `Unemployment Rate`),
                       start = c(year(min(tvs$Date)), month(min(tvs$Date))),
                       frequency = 12)

ts_info(US_indicators_ts)
```

## Setting the series frequency

Setting the frequency of a series sets the length of a cycle.

$$
\text{Frequency} = \frac{\text{cycle length}}{\text{time interval between observation}}
$$

In this example we will see how setting the frequency impacts the structure of the `ts` object output. First, we simulate close to ten years of daily data.

```{r}
daily_df <- data.frame(date = seq.Date(from = as.Date("2010-01-01"),
                                       length.out = 365 * 10, by = "day"),
                       y = rnorm(365 * 10, mean = 15, sd = 2))

str(daily_df)
```

Create `ts` object:

```{r}
days_week_ts <- ts(daily_df,
                   start = c(1, wday(min(daily_df$date))),
                   frequency = 7)

ts_info(days_week_ts)
```

# Data manipulation of `ts` objects

## The window function

The main purpose of a window function is to subset a `ts` object based on a time range. The main argument of the `window()` function are the `start` and `end` arguments. Let's use the `window()` function to extract all the observations of the year 2005 from the NGC series:

```{r}
window(ngc, start = c(2005, 1), end = c(2005, 4))
```

We can also extract a specific frequency unit from the series. Say we're interested in extracting all the observations of the series that occurred in the third quarter of the year. This can be done by setting the starting point at the third quarter of the first year and the `frequency` to 1.

```{r}
window(ngc, start = c(2000, 3), frequency = 1)
```

## Aggregating `ts` objects

The `aggregate()` function splits the data into subsets, computes specific summary statistics, and then aggregates the results to a `ts` or `data.frame` object. Let's use `aggregate()` to transform the NGC series from a quarterly frequency to yearly:

```{r}
ngc_yearly <- aggregate(ngc, nfrequency = 1, FUN = "sum")
ngc_yearly
```

## Creating lags and leads for `ts` objects

The `lag()` function from the **stats** package (this should not be confused with the `lag()` function from the **dplyr** package) can be used to create lags or leads for `ts` objects.

```{r}
ngc_lag4 <- stats::lag(ngc, k = -4)

ts_info(ngc_lag4)
```

# Visualizing `ts` and `mts` objects

## The `plot.ts()` function

Plotting a `ts` object:

```{r}
plot.ts(tvs_ts,
        main = "US Monthly Total Vehicle Sales",
        ylab = "Thousands of Vehicle",
        xlab = "Time")
```

Plotting an `mts` object:

```{r}
plot.ts(US_indicators_ts,
        plot.type = "multiple",
        main = "US Monthly Vehicle Sales vs. Unemployment Rate",
        xlab = "Time")
```

## The **dygraphs** package

The **dygraphs** package is an R interface to the `dygraphs` JavaScript charting library.

```{r}
p_load(dygraphs)

dygraph(tvs_ts,
        main = "US Monthly Total Vehicle Sales",
        ylab = "Thousands of Vehicle") %>% 
  dyRangeSelector()
```

For the `US_indicators_ts` series, we will add a second *y*-axis, which allows us to plot and compare the two series that are not on the same scale:

```{r}
dygraph(US_indicators_ts,
        main = "US Monthly Vehicle Sales vs. Unemployment Rate") %>% 
  dyAxis("y", label = "Vehicle Sales") %>% 
  dyAxis("y2", label = "Unemployment Rate") %>% 
  dySeries("Vehicle Sales", axis = "y", color = "green") %>% 
  dySeries("Unemployment Rate", axis = "y2", color = "red") %>% 
  dyLegend(width = 400)
```

## The TSstudio package

```{r}
p_load(TSstudio)

ts_plot(tvs_ts,
        title = "US Monthly Total Vehicle Sales",
        Ytitle = "Thousands of Vehicle",
        slider = TRUE)
```

We can add an interactive slider for the *x*-axis.

```{r}
ts_plot(US_indicators_ts,
        title = "US Monthly Vehicle Sales vs. Unemployment Rate",
        type = "multiple")
```
